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Abstract 

How  to  best  design  a  communication  architecture  is  becoming  in¬ 
creasingly  important  for  evolving  autonomous  multiagent  systems.  Di¬ 
rectional  reception  of  signals,  a  design  feature  of  communication  that 
appears  in  most  animals,  is  present  in  only  some  existing  artificial  com¬ 
munication  systems.  This  paper  hypothesizes  that  such  directional  re¬ 
ception  benefits  the  evolution  of  communicating  autonomous  agents  be¬ 
cause  it  simplifies  the  language  required  to  express  positional  informa¬ 
tion,  which  is  critical  to  solving  many  group  coordination  tasks.  This 
hypothesis  is  tested  by  comparing  the  evolutionary  performance  of  sev¬ 
eral  alternative  communication  architectures  (both  directional  and  non- 
directional)  in  a  multiagent  foraging  domain  designed  to  require  a  basic 
“come  here”  type  of  signal  for  the  optimal  solution.  Results  indicate  that 
directional  reception  is  a  key  ingredient  in  the  evolutionary  tractability 
of  effective  communication.  Furthermore,  the  real  world  viability  of  di¬ 
rectional  communication  is  demonstrated  through  the  successful  transfer 
of  the  best  evolved  controllers  to  real  robots.  The  conclusion  is  that 
directional  reception  is  an  important  language  feature  to  consider  when 
designing  communication  architectures  for  more  complicated  tasks  in  the 
future. 
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1  Introduction 


Because  there  exists  a  substantial  class  of  tasks  that  require  more  than  one  agent 
to  solve  [26] ,  evolving  multiagent  robot  teams  has  inevitably  attracted  significant 
interest.  Due  to  the  limitations  inherent  in  centralized  control  systems  [17], 
a  popular  alternative  is  to  evolve  controllers  for  multiple  autonomous  agents 
that  confront  the  challenge  of  group  coordination  via  local  interactions.  While 
some  researchers  have  built  controllers  for  non-communicating  autonomous 
agents  [4,  5] ,  communication  can  potentially  improve  group  coordination  through 
sharing  information  about  the  sensed  environment  [16]  or  agent  states  [10].  The 
question  then  becomes  how  to  design  an  effective  communication  scheme.  While 
there  are  many  important  design  features  of  communication  systems  to  consider 
[13],  this  paper  focuses  on  one  of  the  most  basic  features:  directional  reception 
of  signals.  When  this  feature  is  present,  agents  can  perceive  the  direction  from 
which  incoming  signals  originate. 

Interestingly,  both  communication  systems  without  directional  reception 
[3,  16,  29]  and  with  directional  reception  [7,  8,  18,  31]  have  been  proposed;  both 
types  of  communication  systems  achieve  good  results  in  their  respective  domains. 
Yet  despite  the  interest  in  both  communication  schemes  with  and  without 
directional  reception,  few  empirical  studies  have  investigated  the  importance 
of  directionality.  The  hypothesis  in  this  paper  is  that  directional  reception  is 
beneficial  to  the  evolution  of  communicating  autonomous  agents.  Consider  the 
following  thought  experiment:  Imagine  that  you  are  trying  to  help  a  friend 
who  is  on  the  other  side  of  a  crowded  room  to  come  to  your  location.  If  you 
can  only  communicate  through  text  messages,  the  message  you  send  may  be 
similar  to  “come  to  the  southwest  corner  of  the  room,  near  the  grey  statue,” 
or,  “turn  120  degrees  to  your  left  and  walk  towards  me.”  However,  by  shouting 
across  the  room,  you  can  simply  say  “come  here.”  The  reason  for  the  reduction 
in  message  complexity  is  that  the  positional  information  that  is  explicit  in 
the  text  messages  is  implicit  in  the  verbal  message  because  humans  benefit 
from  directional  reception  of  auditory  signals.  Similarly,  agents  equipped  with 
directional  reception  should  be  able  to  communicate  with  less  complex  language 
than  those  without  directional  reception,  which  is  beneficial  for  evolved  controllers 
in  physical  environments  because  simpler  languages  are  easier  to  evolve. 

This  hypothesis  is  validated  by  an  experiment  comparing  the  evolution¬ 
ary  performance  of  directional  and  non-directional  communication  schemes  in 
a  multiagent  foraging  domain  designed  to  require  a  “come  here”  type  of  sig¬ 
nal.  Controllers  in  the  experiment  are  evolved  with  the  HyperNEAT  algorithm 
for  evolving  large-scale  ANNs  [11,  22],  which  has  been  successfully  applied  to 
multiagent  domains  in  the  past  [3-5].  Unlike  many  neuroevolution  methods, 
HyperNEAT  can  be  informed  by  geometric  information  in  the  domain,  mak¬ 
ing  it  an  ideal  platform  for  testing  directional  communication.  Experimental 
results  support  the  hypothesis  that  directional  communication  schemes  are  more 
evolutionarily  tractable  than  their  non-directional  counterparts,  suggesting  that 
directional  reception  is  important  for  enabling  the  evolution  of  group  coordi¬ 
nation  tasks  because  it  redirects  evolutionary  effort  from  learning  a  complex 
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language  to  learning  task-critical  behaviors.  Finally,  the  practical  value  of  the 
evolved  controllers  is  demonstrated  by  transferring  the  best  teams  to  real-world 
Khepera  III  robots. 

It  is  important  to  note  that  while  the  main  hypothesis  may  be  intuitive, 
many  robot  coordination  tasks  in  the  past  have  lacked  such  directional  reception 
[3,  16,  29].  Thus  this  paper  makes  a  practical  contribution  by  serving  as  a 
reminder  that  directionality  is  often  more  important  than  linguistic  complexity 
when  contemplating  the  setup  of  future  experiments  in  evolved  coordination. 


2  Background 

This  section  reviews  prior  work  in  training  communicating  agents  and  summarizes 
the  HyperNEAT  neuroevolution  algorithm  featured  in  the  experiments  presented 
in  this  paper. 

2.1  Communication  in  Multiagent  Teams 

Within  the  field  of  evolutionary  computation,  communication  among  agents  has 
been  studied  from  several  perspectives.  One  of  the  most  popular  questions  is  how 
natural  communication  may  have  emerged  under  evolutionary  pressure;  Wagner 
et  al.  [28]  provide  a  detailed  review  of  past  work  in  this  area  on  the  emergence  of 
communication.  One  aspect  of  communication  that  is  often  present  in  simulated 
models  but  is  rarely  the  subject  of  empirical  testing  is  directional  reception, 
which  is  the  ability  of  an  agent,  upon  hearing  a  message,  to  sense  the  direction 
of  the  message’s  source.  Directional  reception  is  one  of  the  13  design  features 
of  animal  communication  identified  by  Hockett  [13]  and  is  present  in  even  the 
least  advanced  communicating  mammals.  Perhaps  the  reason  its  importance 
is  not  often  tested  is  that  when  tracing  the  evolution  of  humans  back  to  land 
mammals,  directional  reception  co-occurs  with  communication  as  an  immediate 
implication  simply  of  having  ears. 

Communication  is  also  often  important  for  agents  in  a  cooperative  multiagent 
team.  One  situation  in  which  agents  benefit  from  communication  is  when  each 
agent  senses  only  a  fraction  of  the  observable  state  of  the  environment.  In 
such  tasks,  communication  facilitates  sharing  information  among  team  members 
to  increase  the  amount  of  environmental  context  available  to  each  agent  [16]. 
Communication  may  also  be  necessary  in  tasks  in  which  agents  have  limited 
information  about  the  state  of  other  agents  on  the  team,  such  as  when  an  agent 
requires  the  assistance  of  other  agents  to  solve  a  subgoal  [10]. 

Realizing  the  importance  of  communication  in  solving  cooperative  tasks, 
researchers  developing  evolved  or  learned  controllers  for  cooperative  teams  of 
robots  have  experimented  with  different  forms  of  communication,  many  of  which 
include  a  form  of  directional  reception.  For  example,  Yong  and  Mikkulainen  [31] 
developed  neural  controllers  that  received  as  input  the  exact  positions  of  all 
team  members.  Marocco  and  Nolfi  [18]  developed  simulated  robots  with  a  single 
communication  output  that  produces  signals  whose  intensity  and  direction  is 
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perceived  by  team  members  through  directional  pie-slice  sensors.  Di  Paolo  [7] 
developed  robots  that  could  “speak”  using  simulated  sound  waves  and  who  could 
thereby  infer  the  relative  position  of  a  speaker  through  a  set  of  ears  positioned  on 
opposite  sides  of  the  body.  Floreano  et  al.  [8]  evolved  physical  robot  controllers 
that  communicate  their  positions  through  light. 

Considerable  effort  has  also  been  devoted  to  developing  communicating  agents 
that  are  not  aware  of  the  relative  position  of  the  speaker.  In  pioneering  work 
on  the  evolution  of  communication,  Werner  and  Dyer  [29]  evolved  agents  that 
communicate  via  three-bit  binary  signals  to  solve  a  task  in  which  stationary 
“female”  agents  must  guide  blind  “male”  agents  to  their  position  on  a  discretized 
two-dimensional  grid.  Jim  and  Giles  [16]  evolved  teams  of  agents  to  solve  a 
discrete  predator-prey  task;  each  agent  in  the  domain  reads  and  writes  bit  strings 
to  a  shared  message  board.  In  a  recent  work,  D’Ambrosio  et  al.  [3]  solved  a 
multiagent  synchronization  task  by  evolving  a  neural  controller  that  features 
direct  neural  connections  between  agents.  In  each  of  Werner  and  Dyer  [29],  Jim 
and  Giles  [16],  and  D’Ambrosio  et  al.  [3],  positional  information  is  critically 
important  to  solving  the  task  even  though  it  is  not  explicitly  included  in  the 
communication  scheme.  The  binary-string  languages  that  subsequently  emerged 
in  Werner  and  Dyer  [29]  and  Jim  and  Giles  [16]  attached  positional  information 
to  certain  “words”  in  the  language.  D’Ambrosio  et  al.  [3]  in  contrast  achieved 
a  solution  by  extracting  positional  information  from  assumptions  about  the 
starting  positions  and  orientations  of  the  agents. 

This  paper  advances  the  hypothesis  that  knowing  the  relative  location  of 
a  speaker  is  often  critically  important  in  solving  group  coordination  tasks  and 
that  it  is  beneficial  to  include  such  information  implicitly  in  the  communication 
scheme  so  that  evolutionary  effort  need  not  be  expended  incorporating  it  into  the 
emergent  language.  A  recent  neuroevolution  technique  called  HyperNEAT  [11, 
22]  easily  facilitates  the  evolution  of  agents  with  a  position-aware  communication 
scheme  because  HyperNEAT  can  take  into  account  geometric  information  about 
the  domain  (such  as  the  spatial  relationship  between  an  agent’s  vision  or  audition 
sensors),  making  it  a  good  platform  for  testing  this  hypothesis.  The  following 
sections  review  the  HyperNEAT  approach,  which  is  applied  in  the  experiments 
in  this  paper. 

2.2  Neuroevolution  of  Augmenting  Topologies 

The  HyperNEAT  approach  is  itself  an  extension  of  the  original  NEAT  (Neu¬ 
roevolution  of  Augmenting  Topologies)  algorithm  that  evolves  increasingly  large 
ANNs  [23,  25].  NEAT  starts  with  a  population  of  simple  networks  that  then 
increase  in  complexity  over  generations  by  adding  new  nodes  and  connections 
through  mutations.  By  evolving  ANNs  in  this  way,  the  topology  of  the  network 
does  not  need  to  be  known  a  priori;  NEAT  searches  through  increasingly  com¬ 
plex  networks  to  find  a  suitable  level  of  complexity.  Because  it  starts  simply 
and  gradually  adds  complexity,  it  tends  to  find  a  solution  network  close  to 
the  minimal  necessary  size.  However,  as  explained  in  the  next  section,  the 
direct  representation  of  nodes  and  connections  in  the  NEAT  genome  cannot 
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scale  up  to  larger  networks  that  can  take  advantage  of  domain  geometry.  For  a 
complete  overview  of  NEAT,  see  Stanley  and  Miikkulainen  [23]  or  Stanley  and 
Miikkulainen  [25]. 

2.3  Hyper  NEAT 

Like  NEAT,  many  neuroevolution  methods  are  directly  encoded ,  which  means 
each  component  of  the  phenotype  is  encoded  by  a  single  gene,  making  the 
discovery  of  repeating  motifs  expensive  and  improbable  [30] .  Therefore,  indirect 
encodings  [1,  2,  14,  19,  24]  have  become  a  growing  area  of  interest  in  evolutionary 
computation. 

One  such  indirect  encoding  designed  explicitly  for  neural  networks  is  in 
Hypercube-based  NEAT  (HyperNEAT)  [11,  22],  which  is  itself  an  indirect  exten¬ 
sion  of  the  directly-encoded  NEAT  approach  [23,  25]  reviewed  in  the  previous 
section.  This  section  briefly  reviews  HyperNEAT;  a  complete  introduction  can 
be  found  in  Stanley  et  al.  [22]  and  Gaud  and  Stanley  [11].  Rather  than  express¬ 
ing  connection  weights  as  independent  parameters  in  the  genome,  HyperNEAT 
allows  them  to  vary  across  the  phenotype  in  a  regular  pattern  through  an  indirect 
encoding  called  a  compositional  pattern  producing  network  (CPPN;  [21]),  which 
is  like  an  ANN,  but  with  specially-chosen  activation  functions. 

CPPNs  in  HyperNEAT  encode  the  connectivity  patterns  of  ANNs  as  a 
function  of  geometry.  That  is,  if  an  ANN’s  nodes  are  embedded  in  a  geometry, 
i.e.  assigned  coordinates  within  a  space,  then  it  is  possible  to  represent  its 
connectivity  as  a  single  evolved  function  of  such  coordinates.  In  effect  the  CPPN 
paints  a  pattern  of  weights  across  the  geometry  of  a  neural  network.  Because  the 
CPPN  encoding  is  itself  a  network,  it  is  evolved  in  HyperNEAT  by  the  NEAT 
algorithm,  which  is  designed  to  evolve  networks  of  increasing  complexity.  To 
understand  why  this  approach  is  promising,  consider  that  a  natural  organism’s 
brain  is  physically  embedded  within  a  three-dimensional  geometric  space,  and 
that  such  embedding  heavily  constrains  and  influences  the  brain’s  connectivity. 
Topographic  maps  (i.e.  ordered  projections  of  sensory  or  effector  systems  such 
as  the  retina  or  musculature)  in  natural  brains  preserve  geometric  relationships 
between  high-dimensional  sensor  and  effector  fields  [15,  27].  In  other  words, 
there  is  important  information  implicit  in  geometry  that  can  only  be  exploited 
by  an  encoding  informed  by  such  geometry. 

In  particular,  geometric  regularities  such  as  symmetry  or  repetition  are 
pervasive  throughout  the  connectivity  of  natural  brains.  To  similarly  achieve 
such  regularities,  CPPNs  exploit  activation  functions  that  induce  regularities 
in  HyperNEAT  networks.  The  general  idea  is  that  a  CPPN  takes  as  input  the 
geometric  coordinates  of  two  nodes  embedded  in  the  substrate ,  i.e.  an  ANN 
situated  in  a  particular  geometry,  and  outputs  the  weight  of  the  connection 
between  those  two  nodes  (figure  1).  In  this  way,  a  Gaussian  activation  function 
by  virtue  of  its  symmetry  can  induce  symmetric  connectivity  and  a  sine  function 
can  induce  networks  with  repeated  elements.  Note  that  because  the  size  of  the 
CPPN  is  decoupled  from  the  size  of  the  substrate,  HyperNEAT  can  compactly 
encode  the  connectivity  of  an  arbitrarily  large  substrate  with  a  single  CPPN. 
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Substrate 


CPPN 


Figure  1:  HyperNEAT  example.  An  example  substrate  (left)  for  a  simple 
ANN  contains  ten  neurons  that  have  been  assigned  ( x ,  y)  coordinates.  The 
weight  of  every  connection  specified  in  the  substrate  is  determined  by  the  evolved 
CPPN  (right):  (1)  The  coordinates  of  the  source  and  target  (cc2, 2/2) 

neurons  are  input  into  the  CPPN,  (2)  the  CPPN  is  activated,  and  (3)  the  weight 
w  of  the  connection  being  queried  is  set  to  the  CPPN’s  output.  CPPN  activation 
functions  in  this  paper  can  be  sigmoid  (Sig),  Gaussian  (G),  linear  (L),  or  sine 
(Sin). 


Additionally,  HyperNEAT  can  evolve  controllers  for  teams  of  agents.  This 
multiagent  HyperNEAT  algorithm  was  first  introduced  by  D’Ambrosio  and 
Stanley  [5],  D’Ambrosio  et  al.  [4],  and  D’Ambrosio  and  Stanley  [6].  It  can 
be  used  to  evolve  both  homogeneous  and  heterogeneous  teams;  however,  the 
experiments  in  this  paper  only  necessitate  the  homogeneous  case.  Controllers  for 
homogeneous  teams  are  created  by  evolving  a  single  controller  that  is  duplicated 
for  each  agent  on  the  team. 

3  Experiment 

Each  of  the  communication  schemes  reviewed  in  Section  2.1  that  succeed  without 
directional  reception  share  a  common  feature:  the  domain  in  which  they  are 
applied  is  simplified.  For  example,  the  simulated  worlds  in  Werner  and  Dyer  [29] 
and  Jim  and  Giles  [16]  consist  of  discrete  two-dimensional  grids,  and  while 
the  world  in  D’Ambrosio  et  al.  [3]  is  continuous,  agents  are  restricted  to  only 
forward-backward  movement,  effectively  narrowing  actions  to  one  dimension. 
To  test  the  hypothesis  that  non-directional  communication  schemes  require 
more  evolutionary  effort  than  their  directional  counterparts,  a  more  challenging 
domain  is  proposed  that  takes  place  in  a  continuous  two-dimensional  world. 

A  team  of  five  agents  is  placed  in  a  large  rectangular  room  bounded  on  all 
four  sides  by  walls  but  otherwise  free  of  obstructions.  Agents  must  collect  the 
greatest  number  of  food  items  possible  from  the  room  within  a  time  limit.  Only 
one  food  item  is  present  in  the  room  at  a  given  time;  whenever  one  is  collected, 
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a  new  one  spawns  randomly  somewhere  in  the  room.  The  key  mechanism 
encouraging  communication  is  that  food  items  are  not  collected  until  they  are 
touched  by  three  agents  simultaneously.  While  it  is  possible  for  all  three  agents 
to  arrive  at  a  food  item  through  random  wandering  (without  communication), 
the  second  and  third  agents  can  find  the  food  much  faster  if  the  first  agent 
to  find  it  signals  its  location.  The  type  of  communication  that  is  required  for 
the  optimal  behavior  is  thus  relatively  unsophisticated:  agents  must  produce  a 
come  here  signal  upon  discovering  a  food  item.  However,  this  type  of  signal  is 
significantly  more  difficult  to  produce  when  directional  reception  is  not  embedded 
in  the  communication  architecture  because  extra  “words”  must  be  discovered 
to  describe  the  location  of  “here”  (a  concept  that  is  implicit  within  directional 
reception).  While  this  domain  is  relatively  simple,  the  underlying  dynamic 
of  rallying  a  group  of  cooperating  agents  to  salient  locations  in  real  time  is 
fundamental  to  a  range  of  important  real  world  tasks  such  as  rescue,  patrol, 
and  retrieval  tasks.  In  this  way,  this  experiment  helps  to  highlight  the  potential 
importance  of  directional  communication  across  a  range  of  real-world  problems. 

In  addition  to  a  control  scheme  with  no  communication  (NoCom),  three 
different  communication  schemes  are  tested  to  determine  the  extent  to  which 
directional  reception  is  important  in  solving  this  group  foraging  task.  In  the 
first  scheme,  DirCom,  agents  can  emit  signals  on  one  channel  (which  can  range 
from  0.0  to  1.0)  and  can  hear  signals  coming  from  one  of  ten  directions  (equally 
spaced  around  the  agent)  via  an  array  of  ten  pie-slice  sensors.  In  the  second 
communication  scheme,  OneBit,  agents  can  also  emit  signals  on  only  one  channel, 
but  cannot  hear  signals  directionally.  Instead,  agents  have  five  communication 
inputs  for  hearing  signals  from  each  of  the  five  agents  on  the  team.  The  input 
for  sensing  an  agent’s  own  signals  is  disabled  for  consistency  with  the  DirCom 
scheme,  in  which  agents  cannot  hear  themselves  either.  The  final  communication 
scheme,  FiveBit,  is  the  same  as  OneBit  except  that  agents  can  emit  signals 
on  five  channels  and  have  an  array  of  25  communication  inputs  (one  for  each 
channel  on  each  agent).  The  extra  “bits”  can  potentially  make  possible  a  more 
complex  language  capable  of  expressing  the  positional  information  necessary  to 
compensate  for  a  lack  of  directional  reception.  However,  whether  such  a  language 
is  evolutionarily  tractable  will  be  determined  experimentally. 

Individual  agents  are  equipped  with  several  arrays  of  pie-slice  sensors  for 
the  various  senses  that  serve  as  inputs  into  a  HyperNEAT-evolved  ANN.  The 
HyperNEAT  substrate  that  serves  as  the  basis  for  all  the  variant  setups  is  shown 
in  figure  2.  This  kind  of  multi-spatial  substrate  is  shown  effective  for  tasks  with 
multiple  modalities  in  Pugh  and  Stanley  [20].  Agents  sense  the  location  of  food 
items  within  a  maximum  radius  of  100  units  with  a  set  of  five  equal-size  pie-slice 
sensors  that  span  the  frontal  180  degrees  of  vision.  Similarly,  agents  detect  walls 
with  a  set  of  five  100-unit  rangefinders  arranged  across  the  front  of  each  agent  in 
36  degree  intervals.  Each  agent  also  detects  the  location  of  other  agents  with  a 
set  of  ten  unlimited-range1  pie-slice  sensors  that  surround  the  agent.  Each  agent 

1  Activation  of  the  friend  sensors  that  detect  other  agents  is  floored  at  20%  if  sensing  an 
agent  more  than  500  units  away. 
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has  the  ability  to  move  forward  and  to  turn  left  or  right  (a  set  of  three  output 
neurons  control  the  movement:  left,  forward,  and  right,  where  the  direction  and 
magnitude  of  each  turn  is  decided  by  the  difference  between  the  left  and  right 
turn  outputs). 

DirCom,  OneBit,  and  FiveBit  agents  are  identical  in  every  way  except  for 
their  communication  scheme,  which  differ  as  follows.  DirCom  (figure  3a)  and 
OneBit  (figure  3b)  agents  each  have  a  single  output  neuron  for  sending  simple 
communication  signals,  while  FiveBit  (figure  3c)  has  a  set  of  five  output  neurons 
for  sending  more  complex  signals.  DirCom  agents  have  a  set  of  ten  input 
neurons  for  receiving  communication  signals  (figure  4a),  one  for  each  pie-slice  in 
their  360  degree  non-overlapping  array.  OneBit  agents  have  a  set  of  five  input 
neurons  for  receiving  communication  signals  (figure  4b),  one  for  each  agent  on  the 
team  (one  of  which  is  never  activated  because  agents  cannot  hear  themselves). 
FiveBit  agents  have  a  set  of  25  input  neurons  for  receiving  communication 
signals  (figure  4c),  five  for  each  of  five  agents  on  the  team  (five  of  which  are 
never  activated).  Recall  that  because  Hyper  NEAT  is  an  indirect  encoding,  the 
dimensionality  of  the  inputs  is  not  an  obstacle  to  effective  learning  [22], 

The  room  is  1,000  by  900  units,  and  the  five  agents  are  initially  arranged  in 
a  horizontal  line  in  the  center  of  the  room  spaced  100  units  apart.  Food  items 
spawn  randomly  around  the  room,  not  closer  than  40  units  from  a  wall.  During 
evolution,  team  performance  is  averaged  over  20  trials  to  mitigate  the  evaluation 
noise  caused  by  randomized  food  item  spawn  points.  Each  trial,  teams  are  given 
2,000  ticks  of  simulation  time  to  collect  as  many  food  items  as  possible,  up  to  a 
maximum  of  ten  food  items.  Team  fitness  on  a  particular  trial  is  determined 
by  the  number  of  food  items  seen,  the  number  collected,  and  the  time  at  which 
each  was  collected.  More  specifically,  each  food  item  is  worth  a  maximum  of  100 
points,  10  of  which  are  awarded  when  a  single  agent  comes  within  range  (the 
food  is  “seen”),  40  of  which  are  awarded  when  three  agents  come  within  range 
(thus  collecting  the  food),  and  50  of  which  are  time  dependent  (50  points  are 
awarded  if  the  food  is  collected  on  the  first  tick  of  simulation,  diminishing  to  0 
points  awarded  if  the  food  is  collected  on  the  last  tick).  The  time  component  is 
included  to  provide  a  smoother  fitness  gradient  for  evolution  to  follow. 

Evolution  is  run  for  1,000  generations,  by  which  time  fitness  inevitably  stops 
improving.  The  training  phase  consists  of  20  runs  of  evolution  for  each  of  the 
three  communication  schemes,  after  which  the  champions  are  tested  according 
to  a  more  stable  metric:  Testing  performance  is  determined  by  the  raw  number 
of  food  items  collected  in  5,000  time  ticks,  averaged  over  10,000  trials.  During 
testing,  there  is  no  artificial  limit  of  10  food  items,  although  there  are  practical 
limits  due  to  the  time  it  takes  agents  to  travel  across  the  room2. 

Finally,  to  highlight  the  practical  value  of  effective  communication,  the  best 
performing  teams  are  implemented  on  real-world  Khepera  III  robots.  In  the 
real  world  implementation,  several  sensor  types  are  implemented  by  a  central 
station  that  reads  each  agent’s  position  and  orientation  information  on  every  tick 

2  Agents  move  at  a  maximum  rate  of  5  units  per  tick  and  have  a  maximum  turn  rate  of  36 
degrees  per  tick. 


Figure  2:  HyperNEAT  substrate  (all  communication  schemes).  Each 
input  and  output  modality  and  each  hidden  layer  is  placed  on  its  own  plane 
within  the  substrate,  following  the  multi-spatial  substrate  configuration  described 
in  Pugh  and  Stanley  [20].  The  substrate  is  strictly  feedforward:  Each  input 
modality  feeds  into  a  dedicated  “level  1”  hidden  layer,  all  level  1  hidden  layers 
feed  into  a  common  “level  2”  hidden  layer,  and  all  hidden  layers  (level  1  as  well 
as  level  2)  feed  into  all  outputs.  Individual  neural  connections  are  omitted  for 
clarity.  Instead,  arrows  indicate  the  existence  of  neural  connections  between  two 
planes.  Planes  shown  as  connected  are  potentially  fully  connected  (all  neurons 
on  the  first  plane  are  queried  by  HyperNEAT  for  connections  to  all  neurons  on 
the  second  plane).  Communication  input  and  output  planes  vary  depending 
on  communication  scheme;  see  figure  3  for  output  planes  and  figure  4  for  input 
planes. 
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(b)  OneBit 

Figure  3:  Substrate:  communication  output  planes.  The  number  of 
neurons  on  the  communication  output  plane  corresponds  to  the  size  of  the  signal 
vector  sent  to  all  other  agents  on  the  team  on  each  tick:  one  for  DirCorn  (a)  and 
OneBit  (b),  and  five  for  FiveBit  (c). 


(a)  DirCom 


(c)  FiveBit 


via  odometry  on  their  wheel  encoders.  This  information  is  synthesized  to  yield 
communication  sensors,  friend  vision,  and  target  vision.  Hardware  IR  sensors 
serve  as  the  wall  sensors.  While  in  practice  positional  information  in  the  real 
world  may  not  always  be  possible  to  compute  through  a  central  computing  node, 
it  is  important  to  note  that  this  setup  approximates  other  real  world  setups  that 
would  convey  similar  information,  such  as  true  hardware-based  emitters  [8]  or 
decentralized  mutual  localization  techniques  [9]. 

It  is  also  important  to  note  that  in  the  real  world  the  IR  sensors  also  detect  the 
presence  of  other  robots  as  well  as  the  target  points  (plastic  cups),  causing  those 
entities  to  be  treated  like  walls.  This  ambiguity  is  contrary  to  the  case  during 
evolution  where  wall  sensors  only  activate  when  in  range  of  a  wall.  To  partially 
mitigate  this  potential  discrepancy,  walls  in  the  real  world  are  covered  with 
retroreflective  tape  while  robots  and  target  points  are  not.  Thus  agents  perceive 
a  strong  activation  when  seeing  walls  and  a  comparatively  weak  activation  when 
seeing  other  solid  objects  in  the  room.  The  evolved  policies  are  robust  enough 
to  work  in  the  real  world  despite  any  remaining  differences. 

Because  HyperNEAT  differs  from  original  NEAT  only  in  its  set  of  activation 
functions,  it  uses  the  same  parameters  [23].  The  experiment  was  run  with  a 
modified  version  of  the  public  domain  SharpNEAT  package  [12].  The  size  of  the 
population  was  500  with  20%  elitism.  Sexual  offspring  (50%)  did  not  undergo 
mutation.  Asexual  offspring  (50%)  had  0.96  probability  of  link  weight  mutation, 
0.03  chance  of  link  addition,  and  0.01  chance  of  node  addition.  The  coefficients 
for  determining  species  similarity  were  1.0  for  nodes  and  connections  and  0.1  for 
weights.  The  available  CPPN  activation  functions  were  sigmoid,  Gaussian,  linear, 
and  sine,  all  with  equal  probability  of  being  added  to  the  CPPN.  Parameter 
settings  are  based  on  standard  SharpNEAT  defaults  and  prior  reported  settings 
for  NEAT  [23,  25].  They  were  found  to  be  robust  to  moderate  variation  through 
preliminary  experimentation. 
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Figure  4:  Substrate:  communication  input  planes.  Communication  input 
for  DirCom  (a)  consists  of  two  sets  of  five  neurons  corresponding  to  the  front  and 
rear  pie-slice  sensors.  Each  neuron  in  the  communication  input  plane  for  OneBit 
(b)  corresponds  to  an  agent  on  the  team  and  is  activated  when  detecting  signals 
from  that  agent.  The  communication  input  plane  for  FiveBit  (c)  consists  of  a 
5x5  grid  arranged  as  five  horizontal  groups.  Each  horizontal  group  corresponds 
to  an  agent  on  the  team,  and  each  neuron  within  a  group  corresponds  to  a 
different  communication  channel. 


4  Results 

For  each  communication  scheme,  each  of  the  best  performing  genomes  found  after 
1,000  generations  of  evolution  (one  for  each  of  20  runs)  is  evaluated  according  to 
the  testing  metric  described  in  the  previous  section.  The  test  performance  for 
each  communication  architecture,  averaged  over  the  20  champions,  is  presented 
in  figure  5.  The  directional  communication  scheme  significantly  outperforms 
all  others  ( p  <  0.05;  Student’s  t-test),  while  there  is  no  significant  difference 
between  the  OneBit,  FiveBit,  and  NoCom  architectures.  The  best  performing 
NoCom  champion  collects  12.3  food  on  average  during  testing.  Nine  out  of  the 
20  DirCom  champions  pass  a  success  threshold  of  collecting  more  than  105% 
of  the  food  collected  by  the  best  NoCom  champion,  while  the  number  of  such 
successes  for  OneBit  and  FiveBit  are  zero  and  one,  respectively. 

The  best  performing  NoCom  teams  exhibit  surprisingly  effective  exploratory 
behavior.  The  very  best  consists  of  a  single  agent  closely  following  the  walls 
and  corners  of  the  arena  while  the  remaining  four  agents  lag  behind  seeking  the 
leading  agent  and  at  the  same  time  checking  the  interior  of  the  room.  If  food  is 
discovered  in  the  interior,  the  swarm  of  agents  quickly  collects  it,  and  if  food 
is  discovered  near  a  wall,  the  wall-exploring  agent  stops  at  it  and  the  swarm 
of  agents  is  able  to  catch  up  and  thus  collect  the  food  in  a  reasonable  time. 
However,  this  highly  coordinated  behavior  is  rare;  most  NoCom  teams  achieve 
scores  closer  to  10.0  through  random  wandering.  The  best  OneBit  teams  exhibit 
a  similar  style  of  coordination  as  the  best  NoCom  team,  with  scores  of  12.7  and 
12.6.  All  of  the  OneBit  teams  either  send  chaotic  signals  at  all  times,  usually 
with  no  discernable  purpose  (signals  changed  little  if  at  all  when  agents  were  in 
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Figure  5:  Test  performance.  The  testing  performance  of  a  single  run  is 
determined  by  the  average  number  of  food  items  collected  over  10,000  trials  by 
the  team  with  the  highest  training  performance.  Results  for  each  communication 
scheme  are  averaged  over  20  runs.  Error  bars  depict  a  95%  confidence  interval. 
The  DirCom  scheme  performs  significantly  better  than  the  other  schemes  ( p  < 
0.05;  Student’s  t-test). 

range  of  a  target  point  or  wall),  or  else  do  not  send  signals  at  all.  In  contrast, 
the  best  FiveBit  team  produces  a  rudimentary  “come  here”  signal  that  allows  it 
to  achieve  good  results,  with  a  score  of  14.4.  Agents  on  this  team  spread  out  and 
explore  the  map  individually.  When  an  agent  finds  the  food,  it  sends  a  signal 
that  causes  all  other  agents  to  change  behavior  and  begin  to  seek  nearby  agents. 
This  inevitably  causes  the  agents  to  capture  the  food,  although  it  is  clear  that 
they  do  not  know  the  location  of  the  agent  sending  the  signal  because  often  all 
four  of  the  remaining  agents  will  join  up  before  moving  to  the  fifth  (and  thus 
the  food).  While  this  strategy  is  effective,  that  it  was  only  achieved  in  one  of 
the  FiveBit  runs  suggests  the  difficulty  of  discovering  it. 

In  contrast  to  the  non-directional  schemes,  half  of  the  DirCom  teams  learned 
to  produce  a  “come  here”  signal  upon  discovering  the  food.  The  testing  perfor¬ 
mance  of  these  teams  varied  (from  13.0  to  19.3)  according  to  the  effectiveness  of 
non-communicative  behaviors  such  as  exploration  and  the  reliability  of  turning 
towards  a  detected  food  item  without  moving  past  it  (and  thus  out  of  range). 
The  best  DirCom  team,  with  a  test  score  of  19.3,  combines  “come  here”  signal¬ 
ing  with  efficient  exploratory  behaviors  (agents  tend  to  spread  out  as  much  as 
possible). 
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Figure  6:  Real  world  implementation.  In  this  example,  the  robots  explore 
the  arena  by  moving  forward  from  their  starting  positions.  The  third  robot  finds 
the  target  point  (top)  and  signals  its  location  to  the  other  robots  who  then  begin 
moving  towards  it  (bottom). 


The  five  best  performing  DirCom  teams  were  transferred  to  real  Khepera  III 
robots  and  placed  in  an  open  arena  containing  a  single  food  item.  Transferred 
teams  demonstrated  effective  group  coordination  (figure  6),  despite  significant 
differences  between  the  real  world  and  the  simulated  world  (e.g.  robots  can 
collide  with  each  other  in  the  real  world,  while  they  cannot  do  so  in  simulation). 
Real  world  teams  are  tested  with  varying  team  sizes  of  five,  four,  and  three  robots 
without  losing  the  ability  to  solve  the  task.  Videos  of  some  of  the  transferred 
teams  can  be  found  at  http://tinyurl.com/DirComVideo. 


5  Discussion 

The  results  indicate  that  the  communication  scheme  that  includes  directional 
reception  is  able  to  solve  the  task  more  efficiently  than  those  schemes  without 
directional  reception.  In  fact,  only  one  out  of  40  teams  across  both  non-directional 
communication  schemes  achieves  a  level  of  performance  unreachable  by  non- 
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communicating  teams.  By  comparison,  almost  half  of  the  teams  with  directional 
reception  achieve  such  performance.  This  result  demonstrates  that  even  though 
it  is  possible  to  evolve  effective  communication  without  directional  reception, 
doing  so  is  less  evolutionarily  tractable  than  when  directional  reception  is  present 
in  the  system. 

Messages  in  a  system  without  directional  reception  must  express  directional¬ 
ity  through  a  complex  system  of  “words”  that  each  contain  different  positional 
information.  These  words  must  be  invented  independently,  and  they  may  have 
little  or  no  benefit  until  several  have  already  been  invented.  This  problem  is  diffi¬ 
cult  for  evolutionary  algorithms  because  there  may  exist  no  path  of  increasingly 
fit  stepping  stones  to  the  discovery  of  the  final  working  language.  Directional 
reception  alleviates  this  problem  by  incorporating  positional  information  into  the 
communication  architecture  itself  so  that  agents  automatically  learn  the  entire 
vocabulary  of  positional  information  as  soon  as  a  single  word  is  invented.  In  this 
way,  effective  language  is  more  evolutionarily  tractable  because  directionality 
can  be  discovered  in  a  single  step. 

Directional  reception  can  be  implemented  in  the  real  world  in  a  number  of 
ways.  For  example,  it  is  possible  to  perceive  the  direction  of  signals  sent  via 
sound  or  light  [8] .  However,  such  approaches  may  not  be  robust  to  environmental 
factors  such  as  bright  rooms  or  poor  acoustics.  More  promisingly,  directional 
communication  can  be  simulated  if  robots  have  an  accurate  means  of  localization. 
The  real  world  experiment  in  this  paper  demonstrates  a  centralized  localiza¬ 
tion  approach  based  on  wheel  encoder  odometry  that  facilitates  the  real  world 
implementation.  However,  stronger  methods  exist,  such  as  the  decentralized 
localization  proposed  by  Franchi  et  al.  [9]. 

Overall,  the  significant  advantage  of  directional  communication  over  non- 
directional  approaches  in  the  experiment  points  to  the  importance  of  at  least 
considering  providing  directional  sensitivity  when  first  formulating  multiagent 
architectures  for  future  robot  coordination  problems.  With  realistic  real  world 
options  for  such  directional  sensing  available,  such  schemes  merit  serious  consid¬ 
eration  for  complex  multiagent  tasks. 


6  Conclusions 

This  paper  presented  an  empirical  study  of  directional  communication  for  the 
evolution  of  neural  controllers  applied  to  a  group  coordination  task.  A  directional 
communication  scheme  was  compared  to  two  non-directional  schemes  with 
different  potentials  for  message  complexity,  as  well  as  to  a  scheme  in  which 
communication  is  disabled.  The  results  indicate  that  directional  communication 
is  more  evolutionarily  tractable  than  non-directional  approaches  because  it 
eliminates  the  need  to  discover  a  complex  language  to  express  the  positional 
information  that  is  crucial  to  optimally  solving  the  task.  The  real  world  viability 
of  the  directional  communication  scheme  was  demonstrated  by  successfully 
transferring  the  best  evolved  neural  controllers  to  Khepera  III  robots.  Thus 
this  paper  serves  as  a  reminder  of  the  importance  of  directional  reception  and 
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suggests  that  it  is  an  important  language  feature  to  consider  when  designing 
communication  architectures  for  more  complicated  tasks  in  the  future. 
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